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The vast majority of microorganisms in the environment 
remain uncultured, and their existence is known only from 
secpicnces retrieved by PCR. As a consequence , our under¬ 
standing of the ecological function of dominant microbial 
populations in the environment is limited. We will review 
microbial diversity studies and show that these may have 
moved from an extreme underestimation to a potentially 
severe overestimation of diversity. The latter results from a 
simple PCR-generated artifact: the cloning of heteroduplex 
molecules followed by Escherichia coli mismatch repair, 
which may generate an exponential increase in obsetwed 
sequence diversity. However, simple modifications to cur¬ 
rent PCR amplification protocols minimize such artifactual 
sequences and may bring within our reach estimation of 
bacterial diversity in environmental samples. Such esti¬ 
mates may spur new culture-independent approaches based 
on genomic and microarray technology, allowing correla¬ 
tion of phylogenetic identity with the ecological function of 
iinculturable organisms. In particular, we are developing a 
DNA microarray that enables identification of individual 
populations active in utilization of specific organic sub¬ 
strates. The array consists of I6S and 23S rDNA-targeted 
oligonucleotides and is hybridized to RNA extracted from 
samples incubated with I4 C-labeled organic substrates. 
Populations that metabolize the substrate can be identified 
by the radiolabel incorporated in their rRNA after only one 
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to two cell doublings, ensuring realistic presetration of 
community structure. Thus, the microarray approach may 
provide a powerful means to link microbial community 
structure with in situ function of individual populations. 

The last two decades have seen a radical shift in our 
understanding of microbial diversity. Previously the number 
of bacterial and archaeal species had been estimated to be in 
the thousands. It is now generally accepted that the number 
may actually be as high as several million (Torsvik et ah, 
2002). This change was brought about by a gradual replace¬ 
ment of diversity estimates based on pure-culture isolation 
of strains with a determination of diversity based on co¬ 
occurring gene sequences, largely ribosomal RNA (rRNA) 
genes (Head et al., 1998). Although early attempts were 
made to screen diversity by shotgun cloning of environmen¬ 
tal DNA, with subsequent detection and sequencing of 
rRNA gene inserts (Schmidt et al., 1991), large-scale ap¬ 
plication of the molecular approach was dependent on PCR 
protocols that allow the enrichment of rRNA genes from 
genome mixtures using universal primers (Head et al 
1998). Today, the assessment of the entire diversity of 
rRNA sequences (ribotypes) coexisting within specific mi¬ 
crobial communities has become a realistic possibility due 
to the ease of PCR implementation and the increased avail¬ 
ability of high-throughput sequencing facilities. 

Although the exact magnitude of microbial diversity still 
remains an open question, the PCR-based approach has led 
to the retrieval of large numbers of sequences from almost 
any environment examined (Hugenholtz et al., 1998). Thus 
extensive comparative databases are now available from 
which patterns of microbial community structure are begin¬ 
ning to emerge. For example, studies of bacterioplankton 
diversity in the ocean, which represents one of the best- 
studied environments, have shown that, surprisingly, the 
major phylogenetic groups in the open ocean and the coastal 
ocean are similar, despite marked differences in trophic 
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state and habitat quality between these environments (Gio- 
vannoni and Rappe, 2000). Such observations have pro¬ 
vided important insights; yet they also highlight the major 
problem of the molecular diversity approach: because very 
few of the retrieved sequences have closely related cultured 
representatives available, the ecological role of an organism 
in question cannot even be guessed (Hugenholtz et a /., 
1998). Furthermore, even when closely related cultured 
organisms exist, they can display quite significant genomic, 
physiological, or metabolic differences (Gray and Head, 
2001). New alternatives for diversity studies, such as anal¬ 
ysis of large genome fragments retrieved from the environ¬ 
ment (Beja et ai, 2000) or gene cassette PCR for recovery 
of complete open reading frames from environmental DNA 
(Stockes et ciL, 2001), can enhance our understanding of 
uncultured organisms. Nonetheless, elucidation of structure- 
function relationships or niche differentiation of populations 
within microbial communities remains one of the big chal¬ 
lenges in microbial ecology. 

During the last few years, molecular diversity studies 
have been augmented with tracer techniques that allow 
assignment of biogeochemical function to uncultured mi¬ 
crobial populations [recently reviewed by Gray and Head 
(2001)]. Most notably, combined microautoradiography and 
in situ hybridization (STAR- or MICROFISH) (Lee et ai, 
1999; Ouverney and Fuhrman, 1999; Cottrell and Kirch- 
man, 2000) or stable isotope probing (Boschker et cil, 1998; 
Radajewski et ai, 2000) allow identification of microbial 
populations responsible for the metabolism of specific or¬ 
ganic compounds. In both cases, environmental samples are 
incubated with isotopically labeled substrates. In STAR- or 
MICROFISH, microautoradiography and in situ hybridiza¬ 
tion are earned out on the same microscope slide with the 
goal of matching uptake of radiochemicals with phyloge¬ 
netic identification on the single-cell level. In stable isotope 
probing, either lipid biomarkers (Boschker et ai, 1998) or 
DNA (Radajewski et ai, 2000) are extracted from commu¬ 
nities incubated with 13 C-labeled compounds. If cells grow 
on the added compounds, their pool of macromolecules will 
be isotopically heavy compared to those of metabolically 
inactive organisms. This makes it possible to identify the 
organism in one of two ways: (1) by mass spectrometry of 
labeled “signature” lipids (Boschker et ai, 1998); or (2) by 
separation by ultracentrifugation of community DNA ac¬ 
cording to mass differences, followed by identification of 
rRNA genes in the isotopically heavy DNA pool by PCR, 
cloning, and sequencing (Radajewski et ai, 2000). 

The above approaches have already produced interesting 
insights into the ecological roles of uncultured Bacteria and 
Archaea. For example, using MICROFISH, it was demon¬ 
strated that low-temperature Archaea, which represent a 
dominant group in deep ocean water but are currently 
known only from rRNA gene clone libraries, readily take up 
amino acids at low ambient concentrations (Ouverney and 
Fuhrman, 1999). In another study, incubation of anaerobic 


sediments with ,3 C-acetate yielded signature lipids of gram¬ 
positive bacteria rather than those of the more readily iso¬ 
lated delta Proteobacteria sulfate-reducing bacteria (Bosch¬ 
ker et ai, 1998). However, each of the techniques has 
distinct drawbacks. Stable isotope probing requires very 
high substrate concentrations and long incubation times, so 
that the procedure actually resembles enrichment cultures 
(Radajewski et ai, 2000); MICROFISH involves labor- 
intensive hybridization and microautoradiography, which 
limits the number of populations whose metabolism can be 
explored. 

We are currently developing a combination of DNA 
microarrays and radiotracer incubations, the “functional di¬ 
versity array,” as a high-throughput complement to the 
above methods (Bertilsson and Polz, 2001). DNA microar¬ 
rays can carry hundreds to thousands of specific nucleic acid 
probes, which are arrayed in discrete spots. During the 
hybridization process, these probes capture their specific 
target from mixtures of templates. If these are either radio- 
actively or fluorescently labeled, the presence or, to a cer¬ 
tain extent, the quantity of all specific templates for which 
probes have been spotted can be ascertained. The applica¬ 
tion of DNA microarrays for screening and monitoring 
microbial community structure by arraying rRNA-specific 
oligonucleotide probes is being explored by a number of 
laboratories (Cho and Tiedje, 2001; Small et ai, 2001; 
Koizumi et ai, 2002). In the functional diversity array, 
diversity screening is combined with detection of popula¬ 
tions responsible for specific transformations in the com¬ 
munity (Fig. 1). Samples are spiked with 14 C-labeled com¬ 
pounds, leading to incorporation of radionuclides into 
rRNA of populations that actively metabolize the compound 
of interest (Bertilsson and Polz, 2001). RNA is subsequently 
extracted, fluorescently labeled, and hybridized to the mi¬ 
croarray, which contains oligonucleotide probes specific for 
each “ribotype” in the community. Radioactivity in each 
spot due to hybridization can be determined by either mi¬ 
croautoradiography or phosphor-imaging, so that in combi¬ 
nation with the fluorescent signal, a specific activity can be 
estimated for each population (Fig. 1). 

For the functional diversity array to be generally appli¬ 
cable, differentiation of populations by the arrayed probes is 
not, by itself, sufficient. In addition, several critical ques¬ 
tions must be evaluated. First, what is the detection limit for 
,4 C-labeled rRNA hybridized to the array? Second, can 
realistic substrate concentrations be used, and what are the 
kinetics of rRNA synthesis after uptake of label under 
environmental conditions? Third, to what extent can an 
entire microbial community be represented on the array? 
Below, we evaluate these questions, with special emphasis 
on approaches for studying rRNA gene diversity in micro¬ 
bial communities as a necessary precondition for determin¬ 
ing biogeochemical activity of previously unidentified pop¬ 
ulations. 

Quantification of the radioactive signal on arrays shows 
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Figure t. Outline of the experimental approach used for the functional diversity array. 


that the approach is sensitive enough to detect populations 
at naturally occurring levels. The use of phosphor-imaging 
screens, to which the entire array is exposed, allows 14 C 
bound to each spot to be quantified. We have experimentally 
determined the detection limit for I4 C on these screens to be 
0.1 DPM for a spot 150 fim in diameter, which consists of 
about I0 8 oligonucleotide probes. Assuming that the rRNA 
molecules fragment to about 300 bp and that only 1% of the 
oligonucleotide probes will be bound to templates after 
hybridization, the detection limit is between 10 2 and 10 
cells. This estimate is based on a cellular rRNA content 
between 1,000 and 10,000 molecules, which is in the range 
of slow- and fast-growing cells, respectively. Using this 
procedure, we have been able to specifically detect and 
differentiate rRNA from sulfate-reducing strains grown on 
,4 C-labeled lactate (Klepac and Polz, unpubl. data). 

For the array to represent the actual microbial populations 
responsible for metabolism of a specific compound, realistic 
substrate concentrations must be used in the incubations to 
avoid introducing a major bias in community structure due 
to seleetive growth of specific populations. In coastal wa¬ 
ters, which we are using as a model ecosystem, we found 
that even low additions of 14 C-labeled organic substrates 
(representing <3% of the total organic carbon) produced 
highly radiolabeled rRNA after 7 h incubation at in situ 
conditions. This incubation time is similar to the average 


generation time for the entire bacterial community (9 h), and 
both the uptake rate and the growth yield on the labeled 
substrates were linear during the incubation, suggesting that 
there were no major shifts in the microbial community. 
These tests also showed that the proportion of labeled C 
allocated to rRNA was strongly dependent on the quality of 
the substrate (<?.g., 12%-19% for adenine, 1.1%-1.3% for 
acetate). In addition, tests with exponentially growing bae- 
teria in pure culture showed that a constant fraction of the 
total cellular 14 C (average 8% for Vibrio cholera and 17% 
for E. coli) could be recovered in rRNA after about one cell 
doubling. Thus, the major advantages of rRNA detection are 
the linearity of the labeling process and the possible limi¬ 
tation to few cell doublings, which ensure that community 
structure will be only minimally biased. 

The third question, whether rRNA diversity can be as¬ 
certained with realistic effort, requires a reexamination of 
the PCR-based approach. We have recently presented the 
hypothesis that a simple, PCR-induced artifact may lead to 
severe overestimation of diversity of rRNA genes (Thomp¬ 
son et ah, 2002). During the co-amplification of homolo¬ 
gous templates with universal primers, a significant fraction 
(up to 50%) of products may be present as heterodupliees. 
These were increasingly prevalent as template diversity 
increased or primer availability became limiting (Thompson 
et al ., 2002). After cloning, heteroduplex molecules may 
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become subject to mismatch repair by ihe E. coli MulHLS 
system. This can theoretically lead to independent repair of 
each mismatched position since the repair system, in vivo . is 
directed by hemimethylation (Modrich, 1987), which is 
absent in PCR products. A model exploring the effects of 
heteroduplex repair demonstrated that the undirected repair 
process might be responsible for large overestimation of 
rRNA diversity (Thompson et ai, 2002). For example, a 
simple system of 2 sequences with 3 shared mismatched 
positions can result in 8 sequence permuiations; for 4 se¬ 
quences with 10 shared mismatched positions, the number 
increases to 6136 (Thompson et a!., 2002). Although this is 
a dramatic example, the potential contribution of heterodu¬ 
plex repair to sequence diversity can easily be avoided by 
“reconditioning PCR.*' a low-cycle-number reamplification 
of a 10-fold diluted, mixed-template PCR product. 

Allhough the exact contribution of heteroduplex repair to 
diversity estimates is still being analyzed in our laboratory, 
we have used the modified amplification protocol (recondi¬ 
tioning PCR) to estimate bacterial diversity in a coastal 
bacterial community. We generated a clone library from 
amplified 23S rRNA genes, then assessed sequence diver¬ 
sity in the library by a combination of rarefaction analysis 
and Chao-1 estimators, which are based on capture-recap¬ 
ture statislics (Hughes et al ., 2001). The results demon¬ 
strated that diversity was relatively moderate, with the num¬ 
ber of coexisting sequence lypes remaining in the low 100s 
(Acinas. Hunt, Bertilsson. and Polz, unpubl. dala). This 
ongoing analysis is currently complemenled with rarefac¬ 
tion of a 16S rRNA gene library derived from the same 
sample, demonstrating that iL may indeed be possible to 
represent entire communities with reasonable effort on 
DNA microarrays. 
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